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Abstract 

In many signal processing problems, it may be fruitful to represent the signal under study in a 
frame. If a probabilistic approach is adopted, it becomes then necessary to estimate the hyper-parameters 
characterizing the probability distribution of the frame coefficients. This problem is difficult since in 
general the frame synthesis operator is not bijective. Consequently, the frame coefficients are not directly 
observable. This paper introduces a hierarchical Bayesian model for frame representation. The posterior 
distribution of the frame coefficients and model hyper-parameters is derived. Hybrid Markov Chain Monte 
Carlo algorithms are subsequently proposed to sample from this posterior distribution. The generated 
samples are then exploited to estimate the hyper-parameters and the frame coefficients of the target 
signal. Validation experiments show that the proposed algorithms provide an accurate estimation of the 
frame coefficients and hyper-parameters. Application to practical problems of image denoising show the 
impact of the resulting Bayesian estimation on the recovered signal quality. 
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I. Introduction 

Data representation is a crucial operation in many signal and image processing applications. These 
applications include signal and image reconstruction [1,2] , restoration [3,4] and compression [5,6]. In 
this respect, many linear transforms have been proposed in order to obtain suitable signal representations 
in other domains than the original spatial or temporal ones. The traditional Fourier and discrete cosine 
transforms provide a good frequency localization, but at the expense of a poor spatial or temporal localiza- 
tion. To improve localization both in the spatial/temporal and frequency domains, the wavelet transform 
(WT) was introduced as a powerful tool in the 1980's [7]. Many wavelet-like basis decompositions have 
been subsequently proposed offering different features. For instance, we can mention the wavelet packets 
[8] or the grouplet bases [9]. To further improve signal representations, redundant linear decomposition 
families called frames have become the focus of many works during the last decade. For the sake of 
clarity, it must be pointed out that the term frame [10] is understood in the sense of Hilbert space theory 
and not in the sense of some recent works like [11]. 

The main advantage of frames lies in their flexibility to capture local features of the signal. Hence, 
they may result in sparser representations as shown in the literature on curvelets [10], bandelets [12] 
or dual-trees [13] in image processing. However, a major difficulty when using frame representations 
in a statistical framework is to estimate the parameters of the frame coefficient probability distribution. 
Actually, since frame synthesis operators are generally not injective, even if the signal is perfectly known, 
the determination of its frame coefficients is an underdetermined problem. 

This paper studies a hierarchical Bayesian approach to estimate the frame coefficients and their hyper- 
parameters. Although this approach is conceptually able to deal with any desirable distribution for the 
frame coefficients, we focus in this paper on generalized Gaussian (GG) priors. Note however that we do 
not restrict our attention to log-concave GG prior probability density functions (pdf), which may be limited 
for providing accurate models of sparse signals [14]. In addition, the proposed method can be applied to 
noisy data when imprecise measurements of the signal are only available. Our work takes advantage of the 
current developments in Markov Chain Monte Carlo (MCMC) algorithms [15-17] that have already been 
investigated for instance in image separation [18], image restoration [19] and brain activity detection 
in functional MRI [20,21]. These algorithms have also been investigated for signal/image processing 
problems with sparsity constraints. These constraints may be imposed in the original space like in [22], 
where a sparse image reconstruction problem is assessed in the image domain. They may also be imposed 
on some redundant representation of the signal like in [23], where a time-series sparse coding problem 
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is addressed. 

Hybrid MCMC algorithms [24,25] are designed combining Metropolis-Hastings (MH) [26] and Gibbs 
[27] moves to sample according to the posterior distribution of interest. MCMC algorithms and WT 
have been jointly investigated in some works dealing with signal denoising in a Bayesian framework 
[18,28-30]. However, in contrast with the present work where overcomplete frame representations are 
considered, these works are limited to wavelet bases for which the hyper-parameter estimation problem 
is much easier to handle. 

This paper is organized as follows. Section JI] presents a brief overview on the concepts of frame and 
frame representation. The hierarchical Bayesian model proposed for frame representation is introduced 
in Section [Hi] Two algorithms for sampling the posterior distribution are proposed in Section [TV] To 
illustrate the effectiveness of these algorithms, experiments on both synthetic and real world data are 
presented in Section [V] In this section, applications to image recovery problems are also considered. 
Finally some conclusions are drawn in Section [VT] 

II. Problem Formulation 

A. The frame concept 

In the following, we will consider real-valued digital signals of length L as elements of the Euclidean 
space M L endowed with the usual scalar product and norm denoted as (.|.) and || • ||, respectively. Let K 
be an integer greater than or equal to L. A family of vectors (ek)-\<k<K in the finite-dimensional space 
R L is a frame when there exists a constant p in ]0, +oo[ such thau 

K 

VyeK 1 , p\\y\? <^|(y|e fc )| 2 . (1) 

it=i 

'The classical upper bound condition is always satisfied in finite dimension. In this case, the frame condition is also equivalent 
to saying that the frame operator has full rank L. 
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If the inequality £[]) becomes an equality, (ek)i<k<K is called a tight frame. The bounded linear frame 
analysis operator F and the adjoint synthesis frame operator F* are defined as 

F : R L -> R K (2) 

y ^ ((y|efe})i<fc<K, 

F* : R K -> R L (3) 
K 

{£k)l<k<K >-> ^ 6c e fc- 
k=l 

Note that F is injective whereas F* is surjective. When F~ l = F*, {ek)k&L is an orthonormal basis. A 
simple example of a redundant frame is the union of M > 1 orthonormal bases. In this case, the frame 
is tight with \i = M and thus, we have F*F = Ml where I is the identity operator. 

B. Frame representation 

An observed signal y G R L can be written according to its frame representation (FR) involving 
coefficients x G R K as follows 

y = F*x + n (4) 

where n is the error between the observed signal y and its FR F*x. This error is modeled by imposing 
that x belongs to the closed convex set 

C s = {x G R K | N(y - F*x) < 5} (5) 

where 5 G [0, oo[ is some error bound and N(.) can be any norm on R L . 

In signal/image recovery problems, n is nothing but an additive noise that corrupts the measured data. 
By adopting a probabilistic approach, y and x are assumed to be realizations of random vectors Y and 
X. In this context, our goal is to characterize the probability distribution of X|Y", by considering some 
parametric probabilistic model and by estimating the associated hyper-parameters. 
A useful example where this characterization may be of great interest is frame -based signal/image 
denoising in a Bayesian framework. Actually, denoising in the wavelet domain using wavelet frame 
decompositions has already been investigated since the seminal work [31] as this kind of representation 
provides sparse description of regular signals. The related hyper-parameters have then to be estimated. 

When F is bijective and 5 = 0, this estimation can be performed by inverting the transform so as 
to deduce x from y and by resorting to standard estimation techniques on x. However, as mentioned 
in Section III-Al for redundant frames, F* is not bijective, which makes the hyper-parameter estimation 
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problem more difficult since deducing x from y is no longer unique. This paper presents hierarchical 
Bayesian algorithms to address this issue. 

III. Hierarchical Bayesian Model 

In a Bayesian framework, we first need to define prior distributions for the frame coefficients. For 
instance, this prior may be chosen so as to promote the sparsity of the representation. In the following, 
f(x\6) denotes the pdf of the frame coefficients that depends on an unknown hyper-parameter vector 
6 and f(0) is the a priori pdf of the hyper-parameter vector 6. In compliance with the observation 
model dU), f(y\x) is a uniform distribution on the closed convex set D$ defined as 

D S = {y G R L j N(y - F*x) < 5} (6) 

where S > 0. Denoting by the random variable associated with the hyper-parameter vector and using 
the hierarchical structure between Y,X and ©, the conditional distribution of (X, 0) given Y can be 
written as 

f(x,0\y)ocf(y\x)f(x\0)f(0) (7) 

where oc means proportional to. 

In this work, we assume that frame coefficients are a priori independent with marginal GG distributions. 
This assumption has been successfully used in many studies [32-37] and leads to the following frame 
coefficient prior 

where > 0,/3& > (with k G {1, . . . ,K}) are the scale and shape parameters associated with x^, 
which is the kth component of the frame coefficient vector x and T(.) is the Gamma function. Note that 
small values of the shape parameters are appropriate for modelling sparse signals. When Vfc G {1, . . . , K}, 
fa = 1, a Laplace prior is obtained, which was shown to play a central role in sparse signal recovery 
[38] and compressed sensing [39]. 
By introducing Vfc G {1, . . . , K}, 7^ = a^ k , the frame prior can be rewritten aj^| 

fa ( Vkf k 



f(xk\lk,Pk) = — Tjn ex P • ( 9) 

The distribution of a frame coefficient generally differs from one coefficient to another. However, some 
frame coefficients can have very similar distributions (that can be defined by the same hyper-parameters 

2 The interest of this new parameterization will be clarified in Section ITVl 
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Pk and 7^). As a consequence, we propose to split the frame coefficients into G different groups. The 
gfh group will be parameterized by a unique hyper-parameter vector denoted as 6 g = (Pg,jg) (after the 
reparameterization mentioned above). In this case, the frame prior can be expressed as 

G 



me) = n 

9=1 



fig 



,2 7 y /3 T(i/ / 3 g ) / v , ykeSg 

where the summation covers the index set S g of the elements of the gth group containing n g elements and 
6 = . . . , Oq). Note that in our simulations, each group g will correspond to a given wavelet subband. 
A coarser classification may be made when using multiscale frame representations by considering that 
all the frame coefficients at a given resolution level belong to a same group. 

The hierarchical Bayesian model for the frame decomposition is completed by the following improper 
hyperprior 

G 

/w = n w 

9=1 
G 

= n [f{i 9 )f%)\ 

9=1 
G 



cxp 




(10) 



9=1 



1 



lR+(7g)l[0,3](/3g 



(11) 



where for a set A C 



U(0 



(12) 



1 if f G A 
otherwise. 

The motivations for using this kind of prior are summarized below: 

• the interval [0, 3] covers all possible values of (3 g encountered in practical applications. Moreover, 
there is no additional information about the parameter /3 g . 

• The prior for the parameter j g is a Jeffrey's distribution that reflects the absence of knowledge about 
this parameter [40]. This kind of prior is often used for scale parameters. 

The resulting posterior distribution is therefore given by 



G 



f{x,e\y) oc i c A x ) n 

9=1 



rzg-^ ; : J ex P \ Xk ^ 9 — 1 k+(79) 1 [o,3](/3 9 ) 

2^^(1/(3,)] \ 79^ J la 



(13) 



The Bayesian estimators (e.g., the maximum a posteriori (MAP) or minimum mean square error 
(MMSE) estimators) associated with the posterior distribution ([T3l have no simple closed-form expression. 
The next section studies different sampling strategies for generating samples distributed according to the 
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posterior distribution ([T3T ). The generated samples will be used to estimate the unknown model parameter 
and hyper-parameter vectors x and 6. 

IV. Sampling strategies 

This section proposes different MCMC methods to generate samples distributed according to the 
posterior f(x,0\y) defined in (fl"3l ). 

A. Hybrid Gibbs Sampler 

A very standard strategy to sample according to ([7]) is provided by the Gibbs sampler. The Gibbs 
sampler iteratively generates samples distributed according to conditional distributions associated with 
the target distribution. More precisely, the basic Gibbs sampler iteratively generates samples distributed 
according to f(x\0,y) and f(6\x,y). 

1) Sampling the frame coefficients: 
Straightforward calculations yield the following conditional distribution 



where Cg is defined in (f5]). This conditional distribution is a product of GG distributions truncated on 
C§. Actually, sampling according to this truncated distribution is not always easy to perform since the 
adjoint frame operator F* is usually of large dimension. However, two alternative sampling strategies 
are detailed in what follows. 

a) Naive sampling: 

This sampling method proceeds by sampling according to independent GG distributions 



and then accepting the proposed candidate x only if N(y — F*x) < 5. This method can be used for any 
frame decomposition and any norm. However, it can be quite inefficient because of a very low acceptance 
ratio, especially when 5 takes small values. 
b) Gibbs sampler: 

This sampling method is designed to sample more efficiently from the conditional distribution in (fl4l) 
when the considered frame is the union of M orfhonormal bases and N(.) is the Euclidean norm. In 




(14) 




(15) 
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this case, the analysis frame operator and the corresponding adjoint can be written as F 



Fi 



F M 



and 



F* = [F£ . . . F^j], respectively, where Vm G {1, . . . , M}, F m is the decomposition operator onto the 
mth orthonormal basis such as F^Fm = F m F^ = I. 

Every x € R K with K = ML, can be decomposed 3.S X — 5***1 ^ hi 

} T where Vm G {1, . . . , M}, 

X m G R L . 



The Gibbs sampler for the generation of frame coefficients draws vectors according to the conditional 
distribution f(x n \x_ n , y, 6) under the constraint N(y — F*x) < 5, where £c_ n is the reduced size vector 
of dimension R K ~ L built from x by removing the nth vector x n . If iV(.) is the Euclidean norm, we 
have Vn G {1, . . . ,M}, 

AI 

N(y - £ F> m ) < 5 

m=l 

M 

& || F*(F n y - F n F^x rn ) \\< 5 

m=l 

O || F n y- F n F^x m -x n || < 5 (since Vz G M L , || F*z ||=|| z ||) 

<^iV(a; n - c n ) < 5, (16) 

where 

Cn — F n ^y ^ ^ F m x n i^j . 

Having x_ n = (x m ) m ^ n , it is thus easy to compute the vector c n . To sample each x n , we propose to 
use an MH step whose proposal distribution is supported on the ball B Cn : $ defined by 

B Cn j = {a G R L | N(a - c n ) < 5}. (17) 

Random generation from a pdf q$ defined on B s which has a simple expression is described in 

Appendix |A) Having a closed form expression of this pdf is important to be able to calculate the 

(i—l) 

acceptance ratio of the MH move. To take into account the value of x n obtained at the previous 
iteration (i — l), it may however be preferable to choose a proposal distribution supported on a restricted 
ball of radius r] G]0,5[ containing Xn ^ ■ This strategy similar to the random walk MH algorithm [15, 
p. 287] results in a better exploration of regions associated with large values of the conditional distribution 
f(x\6,y). 
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More precisely, we propose to choose a proposal distribution defined on B^a-i) r] , where x£ ^ 
P(xn ^ — Cn) + Cn and P is the projection onto the ball B^^-q defined as 



Va G R , P(a) 



a if N(a) < S - 77 

r (18) 

— -—a otherwise. 
[N(a) 

This choice of the center of the ball guarantees that v C B Cnt $. Moreover, any point of -B c „,<5 can 

be reached after consecutive draws in B^a-i) ?? . Note that the radius 7/ has to be adjusted to ensure a 
good exploration of B Cn $. In practice, it may also be interesting to fix a small enough value of rj so as 
to improve the acceptance ratio. 
Remark: 

Alternatively, a Gibbs sampler can be used to draw successively the L elements (i B ,i)i<i<L of x n under 
the following constraint 

|| Cn || — & 



$ 2 - ^2(x n ,k - c n , k ) 2 < x n ,i - c n j < 6 2 - ^(x„ )(; - c n>k ) 2 , Vle{l,...,L} 

V k + i V k + i 



(19) 



where c n ^ is the /cth element of the vector c n (see [41, p. 133] for related strategies). However, this method 
is very time-consuming since it proceeds sequentially for each component of the high dimensional vector 
x. 

2 ) Sampling the hyper-parameter vector: 
Instead of sampling 6 according to f(0\x, y), we propose to iteratively sample according to /(7 9 |/3 9 , x, y) 
and f((3g\^f g ,x,y). Straightforward calculations allow us to obtain the following results 

f(-y g \P g ,x,y) oc7;^ _1 exp -— £ \x k f' 1 R+ (7 S ), (20) 

\ 19 kes g J 

Consequently, due to the new parameterization introduced in ©, f("fg\/3g, x,y) is the pdf of the inverse 
gamma distribution TQ (jf-,2~2kes \ x k\^ 9 ^j that is eas y to sample. Conversely, it is more difficult to 
sample according to the truncated pdf f(P g \^/ g ,x,y). This is achieved by using an MH move whose 
proposal q(P g \ Pg ) is a Gaussian distribution truncated on the interval [0,3] with standard deviation 
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a/3 g = 0.05 [42]. Note that the mode of this distribution is the value of the parameter [3 g at the 
previous iteration — 

The resulting method is the hybrid Gibbs sampler summarized in Algorithm 1. 



© Initialize with some = {9 { g ] )i< g < G = {lf\ Pf ] )i<g<G and x^ € C s , and set i = 1. 
© Sampling x 
For n = 1 to M 



- Compute c { n ] = F n (y - J2 m <n F m x ™ - E m >n f >™ ^) 



and xt 1) = Pi^t V) -c£)+ctf. 

(i) 

- Simulate follows: 

* Generate x$ ~ q v (x n — x^~^) where q v is defined on Bq^ (see Appendix EJ). 

* Compute the ratio 

r{Xn ' Xn > /(^- i 'i^-), (3: «) m< „,( a; ;r 1 ') m >„, y ) g „(s<» 

and accept the proposed candidate with the probability min{l, r(x^\ Xn ^)}. 
Sampling 6 
For g = 1 to G 



- Generate lg - ZQ Ekes g \ x k 

(i) 

- Simulate [5 g as follows: 

* Generate J$ - q(f3 g \ 

* Compute the ratio 

and accept the proposed candidate with the probability min{l, r(P g l \ f3 g l 
Set i <— i + 1 and goto © until convergence. 



Algorithm 1: Proposed Hybrid Gibbs sampler to simulate according to f(x,0\y) (superscript -W 
indicates values computed at iteration number i). 



Although this algorithm is intuitive and simple to implement, it must be pointed out that it was derived 
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under the restrictive assumption that the considered frame is the union of M orfhonormal bases. When 
this assumption does not hold, another algorithm proposed in the next section allows us to sample frame 
coefficients and the related hyper-parameters by exploiting algebraic properties of frames. 

B. Hybrid MH sampler using algebraic properties of frame representations 

As a direct generation of samples according to f(x\0, y) is generally impossible, we propose here an 
alternative that replaces the Gibbs move by an MH move. This MH move aims at sampling globally a 
candidate x according to a proposal distribution. This candidate is accepted or rejected with the standard 
MH acceptance ratio. The efficiency of the MH move strongly depends on the choice of the proposal 
distribution for x. We denote as a;W the ith accepted sample of the algorithm and q(x | a^*" 1 )) the 
proposal that is used to generate a candidate at iteration i. The main difficulty for choosing q(x j a?(' i_1 )) 
stems from the fact that it must guarantee that x € C$ (as mentioned in Section III-B b while yielding a 
tractable expression of qix^-V | x)/q(x | ajf* -1 )). 

For this reason, we propose to exploit the algebraic properties of frame representations. More precisely, 
any frame coefficient vector can be decomposed as x = xh + Xjj±, where xh and x H ± are realizations 
of random vectors taking their values in H = Ran(F) and H ± = [Ran(F)]- L = Null(F*), respectively]] 
The proposal distribution used in this paper allows us to generate samples xh £ H and x^± £ H L . 
More precisely, the following separable form of the proposal pdf will be considered 

q(x | a: w ) = q (x H \ x^^J q (x H ± \ (22) 

where x H l ' £ H, x H ^ £ H ± and a;^" 1 ) = x^ ^ + x^± \ In other words, independent sampling of 
xh and x^± will be performed. 

If we consider the decomposition x = xh + xh±, sampling x in C$ is equivalent to sampling A € C5, 
where C s = {Xe R L \N(y - F*FX) < 5} is the inverse image of C$ under F. 

Indeed, we can write xh = FX where AgI 1, and, since x H ± e Null(F*), F*x = F*FX. Sampling 
A in Cs can be easily achieved, e.g., by generating u from a distribution supported on the ball B y s and 
by taking A = (F*F)- 1 u. 

To make the sampling of xh at iteration i more efficient, taking into account the sampled value at 
the previous iteration x H ^ = FA^" 1 ^ = F(F*F)~ 1 it^~ 1 ^ may be interesting. Similarly to Sec- 
tion HV-Alb ). and to random walk generation techniques, we proceed by generating randomly u in 

We recall that the range of F is Ran(F) = {x G R K \3y G K L , Fy = x} and the null space of F* is Null^*) = {x G 
R K \F*x = 0}. 
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Bjfi-i) where rj g]0, S[ and u^ % ~ 1 ^ = P(u^~ 1 ^ — y) + y. This allows us to draw a vector u such that 
= F(F* F)~ 1 u 6 C5 and iV(u — tt^ -1 )) < 2?/. The generation of it can then be performed as 
explained in Appendix lAl provided that N(.) is an l v norm with p € [1, +00]. 

Once we have simulated xh = FX 6 H n C5 (which ensures that a; is in C5), s^i has to be sampled 
as an element of H ± . Since y = F*x + n = F*xh + n, there is no information in y about xjj±. 
As a consequence, and for simplicity reasons, we propose to sample xh by drawing z according to the 
Gaussian distribution M{x^ l ~ 1 \ a^I) and by projecting z onto H- 1 , i.e., 

xh± = %±2 (23) 

where Hh^ = I — F(F*F)~ l F* is the orthogonal projection operator onto H- 1 . 

Note here that using a tight frame makes the computation of both xh and xjj± much easier due to 
the relation F*F = fil. 

Let us now derive the expression of the proposal pdf. It can be noticed that, if K > L, there exists a 
linear operator F± from M> K ~ L to M. L which is semi-orthogonal (i.e., F^_F± = I) and orthogonal to F 
(i.e., F*F = 0), such that 

x = FX, + F ± X ± (24) 

and X± = F ± x £ M> K ~ L . Standard rules on bijective linear transforms of random vectors lead to 

q{x I x^-V) = I det ([F F x }) | -1 g(A | x (i_:l) )g(A_L | x {i ~ l) ) (25) 

where, due to the bijective linear mapping between A and u = F*FX 

q(X I x^-V) = det(FF*) q v (u - u (i-1) ) (26) 

and q(X ± | aj (i_1) ) is the pdf of the Gaussian distribution M (X^ 1 ^, a x I) with mean A^ _1) = Flx^~ l \ 
Recall that q v denotes a distribution on the ball Bq v as expressed in Appendix |A] Due to the symmetry 
of the Gaussian distribution, it can be deduced that 

g(g (i ~ 1} I a) = - P(u -y)-y) 

This expression remains valid in the degenerate case when K = L (yielding x^± = 0). Finally, it is 
important to note that, if q v can be chosen as a uniform distribution on the ball Bo v , the above ratio 
reduces to 1, which simplifies the computation of the MH acceptance ratio. 

The final algorithm is summarized in Algorithm 2. Note that the sampling of the hyper-parameter vector 
is performed as for the hybrid Gibbs sampler in Section IIV-A2I 
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© Initialize with some 0<°) = (0g O) )i< 9 <G = (7^, /5g 0) )i< s <G and «(°) G Set as(°) 

F^F^uW and i = 1. 
© Sampling a; 

- Compute tt (i_1) = P^*" 1 ) -y)+y. 

- Generate ^ q v (u — tt^ -1 ^) where efy is defined on £>o,»? (see Appendix lAb. 

- Compute x^ = F (F* F)~ l u {i) . 

- Generate z<*) - Mix^, a 2 x I). 



- Compute Xjj± = ILgxz® and »W = _|_ 



= JT-' *® ^ ^ W - ^ -J- ~ W 

- Compute the ratio 



(0 _ /(ggV^y) gqfoP- 1 * -P(g (i) -y) -y) 



/(ajC^l^.yJg^aW-^*- 1 )) 
and accept the proposed candidates vS 1 ' and ar 1 ' with probability 
min{l, r{x^\x^~^)}. 
Sampling 6 
For g = 1 to G 

- Generate 7 « - J£ (^ TT , E feeSs I'W'' 

- Simulate [3 g as follows 

* Generate J>f - 9 (0 fl | /fl£ i-1) ) 

* Compute the ratio 



and accept the proposed candidate with the probability mm{l,r(/3g ,{3g ^)}. 
Set i <— z + 1 and goto © until convergence. 



Algorithm 2: Proposed Hybrid MH sampler using algebraic properties of frame representations 
to simulate according to f(x,0\y). 



Experimental estimation results and applications to some image denoising problems of the proposed 
stochastic sampling techniques are provided in the next section. 
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V. Simulation Results 

A. Validation experiments 
1) Example 1: 

To show the effectiveness of our algorithm, a first set of experiments was carried out on synthetic 
images. As a frame representation, we used the union of two 2D separable wavelet bases B\ and £>2 
using Daubechies and shifted Daubechies filters of length 8 and 4, respectively. The £2 norm was used 
for N(-) in dU) with S = 10~ 4 . To generate a synthetic image, we synthesized wavelet frame coefficients 
x from known prior distributions. 

Let a: 1 = (ai, (h\j, vij, dij)i<j<2) and 052 = (02, (h2,j,V2,j,d2,j)i<j<2) be the sequences of wavelet 
basis coefficients generated in B\ and B2, where a, h, v, d stand for approximation, horizontal, vertical 
and diagonal coefficients and the index j designates the resolution level. Wavelet frame coefficients have 
been generated from a GG distribution in accordance with the chosen priors. The coefficients in each 
subband have been modeled with the same values of the hyper-parameters a g and f3 g , which means 
that each subband forms a group of index g. The number of groups (i.e. the number of subbands) G is 
therefore equal to 14. A uniform prior distribution over [0, 3] has been chosen for parameter (3 g whereas 
a Jeffrey's prior has been assigned to each parameter y g . 

After generating the hyper-parameters from their prior distributions, a set of frame coefficients is 
randomly generated to synthesize the observed data. The hyper-parameters are then supposed unknown, 
sampled using the proposed algorithm, and estimated by computing the mean of the generated samples 
according to the MMSE principle. Having reference values, the normalized mean square erors (NMSEs) 
related to the estimation of each hyper-parameter belonging to a given group (here a given subband) have 
been computed from 30 Monte Carlo runs. The NMSEs computed for the estimators associated with the 
two samplers of Sections IIV-AI and IIV-BI are reported in Table U 



November 16, 2009 



DRAFT 



15 

TABLE I 

NMSES FOR THE ESTIMATED HYPER-PARAMETERS (30 RUNS). 





NMSE 


Sampler 1 


Sampler 2 


/3 


a 


P 


a. 


hi,i 


0.015 


0.006 


0.012 


0.030 


Vl,l 


0.022 


0.021 


0.022 


0.026 




0.06 


0.016 


0.011 


0.044 


hi,2 


0.04 


0.003 


0.021 


0.026 


Vl,2 


0.020 


0.027 


0.020 


0.019 




0.013 


0.016 


0.023 


0.041 


ai 


0.039 


0.08 


0.039 


0.023 




0.015 


0.030 


0.015 


0.025 


«2,1 


0.051 


0.07 


0.025 


0.031 


^2,1 


0.027 


0.039 


0.029 


0.023 


h.2,2 


0.040 


0.024 


0.016 


0.034 


V2,2 


0.08 


0.019 


0.013 


0.022 


d.2,2 


0.05 


0.015 


0.011 


0.040 


0.2 


0.010 


0.064 


0.010 


0.028 



Table U shows that the proposed algorithms (using Sampler 1 of Section IIV-AI and Sampler 2 of 
Section HV-Bb provide accurate estimates of the hyper-parameters. The two samplers perform similarly 
for this experiment. However, one advantage of Sampler 2 is that it can be applied to different kinds 
of redundant frames, unlike Sampler 1. Indeed, as reported in Section HV-AI the conditional distribution 
(fl4l) is generally difficult to sample when the frame representation is not the union of orthonormal bases. 

Two examples of empirical histograms of known reference wavelet frame coefficients (corresponding 
to B\) and pdfs with estimated hyper-parameters are plotted in Fig. [T]to illustrate the good performance 
of the estimator. 
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an: P = 1.7, 7 = 104 ft lj2 : /? = 1.98, 7 = 143.88 




Fig. 1. Examples of empirical approximation (left) and detail (right) histograms and pdfs of frame coefficients corresponding 
to a synthetic image. 

2) Example 2: 

In this experiment, another frame representation is considered, namely a tight frame version of the 
translation invariant wavelet transform [43] with Daubechies filters of length 8. The £2 norm was also 
used for N(.) in (0]) with S = 10~ 4 . Let x = (a, (hj,Vj,dj)i<j<2) denote the frame coefficients vector. 
We used the same process to generate frame coefficients as for Example 1. The coefficients in each 
subband (i.e. each group) have been modeled with the same values of the hyper-parameters 7 5 and /3 g , 
the number of groups being equal to 7. The same priors for the hyper-parameters 7 9 and (3 g as for 
Example 1 have been used. 

After generating the hyper-parameters and frame coefficients, the hyper-parameters are then supposed 
unknown, sampled using the proposed algorithm, and estimated using the MMSE estimator. Table [TT] 
shows NMSEs based on reference values of each hyper-parameter. Note that Sampler 1 is difficult to 
be implemented in this case because of the used frame properties. Consequently, only NMSE values for 
Sampler 2 have been reported in Table HTl 

B. Convergence results 

To be able to automatically stop the simulated chain and ensure that the last simulated samples are 
appropriately distributed according to the posterior distribution of interest, a convergence monitoring 
technique based on the potential scale reduction factor (PSRF) has been used by simulating several chains 
in parallel (see [44] for more details). Using the union of two orthonormal bases as a frame representation, 
Figs. [2] and [3] show examples of convergence profiles corresponding to the hyper-parameters and 7 
when two chains are sampled in parallel using Sampler 2. 

Based on these values of the PSRF, the algorithm was stopped after about 150, 000 iterations (burn-in 
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TABLE II 

NMSES FOR THE ESTIMATED HYPER-PARAMETERS USING SAMPLER 2 (30 RUNS). 





NMSE 


P 


a 


hi 


0.05 


0.027 


Vl 


0.024 


0.007 


di 


0.05 


0.014 


h 2 


0.037 


0.028 


V2 


0.051 


0.044 


d 2 


0.04 


0.012 


a 


0.04 


0.05 



period of 100, 000 iterations), which corresponds to about 4 hours of computational time using Matlab 7.7 
on an Intel Core 4 (3 GHz) architecture. When comparing the two proposed samplers in terms of 
convergence speed, it turns out from our simulations that Sampler 1 shows faster convergence than 
Sampler 2. Indeed, Sampler 1 needs about 110,000 iterations to converge, which reduces the global 
computational time to about 3 hours. 



Chain 1 Chain 2 




(3 = 2,3 - PSRF = 1.02 




7 = 185 - PSRF = 0.96 
Fig. 2. Ground truth values and sample path for the hyper-parameters j3 and 7 related to vi t i in Bi. 
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Chain 1 



Chain 2 




P = 1.81 - PSRF = 0. 





7 = 31.5 - PSRF = 1.03 
Fig. 3. Ground truth values and sample path for the hyper-parameters j3 and 7 related to v 2 ,2 in 82- 



The posterior distributions of the hyper-parameters (3 and 7 related to the subbands h\£ and ^2,2 m 
£>i and B2 introduced in Section IV-A1I are shown in Fig. 01 as well as the known original values. It is 
clear that the mode of the posterior distributions is around the ground truth value, which confirms the 
good estimation performance of the proposed approach. 

Note that when the resolution level increases, the number of subbands also increases, which leads to a 
higher number of hyper-parameters to be estimated and a potential increase of the required computational 
time to reach convergence. For example, when using the union of two orthonormal wavelet bases with 
two resolution levels, the number of hyper-parameters to estimate is 28. 
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C. Application to image denoising 
1) Example 1: 

In this experiment, we are interested in recovering an image (the Boat image of size 256 x 256) from its 
noisy observation affected by a noise n uniformly distributed over the ball [— 5, 5] 256x256 with 5 = 30. 
We recall that the observation model for this image denoising problem is given by (@]). The noisy image 
in Fig. |5](b) was simulated using the available reference image y Tc{ in Fig. [5] (a) and the noise properties 
described above. 

The union of two 2D separable wavelet bases B\ and £>2 using Daubechies and shifted Daubechies 
filters of length 8 and 4 (as for validation experiments in Section IV-AI) was used as a tight frame 
representation. Denoising was performed using the MMSE denoted as x computed from sampled wavelet 
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frame coefficients. The adjoint frame operator is then applied to recover the denoised image from its 
denoised estimated wavelet frame coefficients (y = F*x). The obtained denoised image is depicted in 
Fig. 0(d). For comparison purpose, the denoised image using a variational approach [45,46] based on 
a MAP criterion using the estimated values of the hyper-parameters with our approach is illustrated in 
Fig- (c). This comparison shows that, for denoising purposes, the proposed method gives better visual 
quality than the other reported methods. 

Signal to noise ratio (SNR = 201og 10 (||j/ re f ||/||y re f — 2/11)) ar >d structural similarity (SSIM) [47] values 
are also given in Table [III] to quantitatively evaluate denoising performance. Note here that SSIM values 
must lie in [0,1], high values indicating good image quality. 

An additional comparison with respect to Wiener filtering is given in this table. The SNR and SSIM 
values are given for three additional test images with different textures and contents to better illustrate 
the good performance of the proposed approach. The corresponding original, noisy and denoised images 
are displayed in Figs. [6j [7] and [8j 

TABLE III 

SNR AND SSIM VALUES FOR THE NOISY AND DENOISED IMAGES. 





Noisy 


Wiener 


Variational 


MCMC 


Boat 


SNR (dB) 


16.67 


18.02 


18.41 


19.20 


SSIM 


0.521 


0.553 


0.570 


0.614 


Marseille 


SNR (dB) 


18.53 


19.27 


20.55 


20.77 


SSIM 


0.797 


0.802 


0.824 


0.866 


henna 


SNR (dB) 


17.69 


19.63 


21.79 


22.13 


SSIM 


0.496 


0.583 


0.671 


0.695 


Peppers 


SNR (dB) 


21.23 


21.64 


22.40 


22.67 


SSIM 


0.754 


0.781 


0.807 


0.811 



It is worth noticing that the visual quality and quantitative results show that the denoised image based 
on the MMSE estimate of the wavelet frame coefficients is better than the one obtained with the Wiener 
filtering or the variational approach. For the latter approach, it must be emphasized that the choice of 
the hyper-parameters always constitutes a delicate problem, for which our algorithm brings a numerical 
solution. 
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2) Example 2: 

In this experiment, we are interested in recovering an image (the Straw image of size 128 x 128) from 
its noisy observation affected by a noise n uniformly distributed over the centered £ p ball of radius r\ 
when p € {1, 2, 3}. The translation invariant wavelet transform was used as a frame decomposition with 
a Symmlet filter of length 8 over 3 resolution levels. The £ p norm (p € {1, 2, 3}) was used for N(-) in 
(01). Figs. [9] (a) and [9] (b) show the original and noisy images using a uniform noise over the £2 ball of 
radius 1600. Figs. [9] (c) and [9] (d) illustrate the denoising strategies based on the variational approach 
and the MMSE estimator using frame coefficients sampled with our algorithm. 

Table [IV] illustrates the SNR and SSIM values for noisy and denoised images using the proposed 
MMSE estimator with uniformly distributed noise for different values of p and 77. 

TABLE IV 

SNR AND SSIM VALUES FOR THE NOISY AND DENOISED IMAGES. 





Noisy 


Wiener 


Variational 


MCMC 


77 = 300000 
p=l 


SNR (dB) 


15.56 


16.42 


16.67 


18.11 


SSIM 


0.719 


0.705 


0.730 


0.755 


77 = 3000 
p = 2 


SNR (dB) 


16.46 


17.03 


17.84 


19.02 


SSIM 


0.749 


0.720 


0.758 


0.796 


r? = 700 
P = 3 


SNR (dB) 


16.14 


17.05 


17.65 


19.29 


SSIM 


0.734 


0.720 


0.671 


0.771 
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This second set of image denoising experiments shows that the proposed approach performs well when 
using different kinds of frame representations and various noise properties. 

(a) (b) 




Fig. 5. Original Boat image (a), noisy image (b), denoised images using a variational approach (c) and the proposed MMSE 
estimator (d). 
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VI. Conclusion 

This paper proposed a hierarchical Bayesian algorithm for frame coefficient from a noisy observation 
of a signal or image of interest. The signal perturbation was modelled by introducing a bound on a 
distance between the signal and its observation. A hierarchical model based on this maximum distance 
property was then defined. This model assumed GG priors for the frame coefficients. Vague priors were 
assigned to the hyper-parameters associated with the frame coefficient priors. Different sampling strategies 
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were proposed to generate samples distributed according to the joint distribution of the parameters and 
hyper-parameters of the resulting Bayesian model. The generated samples were finally used for estimation 
purposes. Our validation experiments showed that the proposed algorithms provide an accurate estimation 
of the frame coefficients and hyper-parameters. The good quality of the estimates was confirmed on 
statistical processing problems in image denoising with multivariate noise uniformly distributed on some 
given ball. Despite its interest in dealing with bounded errors, this model was fewly investigated in the 
wavelet denoising literature. 

The hierarchical model studied in this paper assumed GG priors for the frame coefficients. However, the 
proposed algorithm might be generalized to other classes of prior models. Another direction of research 
for future work would be to extend the proposed framework to situations where the observed signal is 
degraded by a linear operator (e.g. blur operator). 

Appendix A 
Sampling on the unit l p ball 

This appendix explains how to sample vectors in the unit t v ball (p e]0, +00]) of M L . First, it is 
interesting to note that sampling on the unit ball can be easily performed in the particular case p = +00, 
by sampling independently along each space coordinate according to a distribution on the interval [—1,1]. 
Thus, this appendix focuses on the more difficult problem associated with a finite value of p. In the 
following, || • ||p denotes the £ p norm. We recall the following theorem: 

Theorem A.l [48] 

Let A = [A±, . . . , Ai'] T be the random vector of i.i.d. components which have the following GG(p 1//p ,p) 
pdf 

n 1 ~ 1 /p f \q s \V\ 

/w = lwrt exp ( ~T )• <28) 

Let U = [Ui, . . . , Ul'] t = A 1 1| A \\ p . Then, the random vector U is uniformly distributed on the surface 
of the l p unit sphere of M L and the joint pdf of U\, . . . , Ul>-\ is 

p^TtL'/p) ( L ^ \ (1 " PVP 

/(m, . . . = 2 L>-i(r(i/ p ))L> ( 1 ~ 12 K1 P ) 1d„ £ ,(ui,...,«j/-i) (29) 



k=l 



where D p L > = {(u u u^.j) G R 1 '" 1 | Y.k=i H\ p < 1}- 

The uniform distribution on the unit £ p sphere of IR L will be denoted by U(L',p). The construction of 
a random vector distributed within the l v ball of ]R L with L < V can be derived from Theorem IA.1I as 
expressed below: 



November 16, 2009 



DRAFT 



28 



Theorem A.2 [48] 

Let U=[Ui,..., Uv] T ~ U(L',p). For every L G {1, . . . , l! - 1}, the pdf of V = [E/i, . . . , t/ L ] T is 
given by 

p^rfL'/p) / L \^'- L)/p - 1 

- , «!-) = 2 L (r(1/p)) L r((L ,_ L)/p) ^ - E l«*l P J liw(«i> -> (3°) 

In particular, if p G N* and L' = L + p, we obtain the uniform distribution on the unit ^ p ball of 
Sampling on an t v ball of radius 7] > is straightforwardly deduced by scaling V. 
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